Exploration of Effective Potential Landscapes 
using Coarse Reverse Integration 

Thomas A. Frewen* Gerhard Hummer^ loannis G. Kevrekidis-'- 

September 24, 2008 



Abstract 

We describe a reverse integration approach for the exploration of low-dimensional 
effective potential landscapes. Coarse reverse integration initialized on a ring of coarse 
states enables efficient "navigation" on the landscape terrain: escape from local effec- 
tive potential wells, detection of saddle points, and identification of significant transi- 
tion paths between wells. We consider several distinct ring evolution modes: backward 
stepping in time, solution arc-length, and effective potential. The performance of 
these approaches is illustrated for a deterministic problem where the energy landscape 
is known explicitly. Reverse ring integration is then applied to "noisy" problems where 
the ring integration routine serves as an outer "wrapper" around a forward-in-time 
inner simulator. Three versions of such inner simulators arc considered: a system of 
stochastic differential equations, a Gillespie-type stochastic simulator, and a molecular 
dynamics simulator. In these "equation-free" computational illustrations, estimation 
techniques are applied to the results of short bursts of "inner" simulation to obtain 
the unavailable (in closed form) quantities (local drift and diffusion coefficient esti- 
mates) required for reverse ring integration; this naturally leads to approximations of 
the effective landscape. 
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1 Introduction 



When an energy landscape perspective is applicable, the dynamics of a complex system 
appear dominated by gradient-driven descent into energy wells, occasional trapping in deep 
minima, and transitions between minima via passage over saddle points through thermal 
"kicks" . A paradigm for this landscape picture is the trapping of protein configurations in 
metastable states en route to the dominant folded state. The underlying energy landscape 
is often likened to a roughened funnel with trapped states corresponding to local free energy 
minima'^. 

Important features on energy surfaces include local minima and their associated basins of 
attraction, saddle points, and minimum energy paths (MEPs) between neighboring minima 
passing through these saddles. Besides the identification of such landscape features, estab- 
lishing the details of their connectivity is a task of considerable importance. Knowledge of 
the relative depths of landscape minima provides thermodynamic information. The kinetics 
of transitions between such states is determined by the type of "terrain" (smooth, rugged, 
etc.) that surrounds and separates them, in particular by the location and height of the 
low-lying saddles. The identification of important low energy molecular conformations in 
computational chemistryl^l, and determination of protein and peptide folding pathways^, to 
name but a few, rely on an ability to perform intelligent, targeted searches of the energy 
landscape. 

Molecular dynamics (MD) and Monte Carlo (MC) simulations on energy landscapes are 
typically limited in the time scales they can explore by the difference between system thermal 
energy and the height of transitional energy barriers. A significant fraction of MD and MC 
simulation time is spent "bouncing around" in local minima. Energy barriers separating 
minima cause this type of trapping and the result is long waiting times between infrequent, 
but interesting, transition events. An array of techniques have been proposed to overcome 
such time scale limitations including bias-potential approaches'^, accelerated dynamics'^, 
coarse- variable dynamics'^, and transition path sampling'^'^, allowing extensive exploration 
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of the energy surface and its transition states. The adaptive bias force method'^^^ efficiently 
samples configurational space in the presence of high free energy barriers via estimation 
of the force exerted on the system along a well-defined reaction coordinate. Short bursts 
of appropriately initialized simulations are used in coarse-variable dynamics'^ to infer the 
deterministic and stochastic components of motion parametrized by an appropriate set of 
coarse variables. The use of a history-dependent bias potential in Ref. 13 ensures that minima 
are not revisited, allowing for efficient exploration of a free energy surface parametrized 
by a few coarse coordinates. Accelerated dynamics methods such as hyperdynamics and 
parallel replica dynamics^ "stimulate" system trajectories trapped in local minima to find 
appropriate escape paths while preserving the relative escape probabilities to different states. 
Transition path sampling generalizes importance sampling to trajectory space and requires 
no prior knowledge of a suitable reaction coordinate (see also transition interface samplingfl^. 

Many energy landscape search methods have been devised (too numerous to discuss in 
detail here). "Single-ended" search approaches (where only the initial state is known) are 
based on eigenvector-following (mode-following)'2n3IlSIMiZIlll ^nd have been used to refine 
details of minimum energy paths (MEPs) close to saddle point^^^'^ni methods purely for 
efficient saddle point identification also exist'^^^^^'^. Chain-of-state methods are a more recent 
class of double-ended searches that evolve a chain of replicas (system states or "images"), 
distributed between initial and final states, in a concerted manner'235. The original elastic 
band methodl^^l^ has been refined and extended many timesl^^l^. More recently string 
method^^SESEI]^ which evolve smooth curves with intrinsic parametrization, have been used 
to locate significant paths between two states. The Global Terrain approach of Lucia and 
coworkers'^2'^ exploits the inherent connectedness of stationary points along valleys and 
ridges on the landscape for their systematic identification. 

We build here on the "equation-free" formalism of Ref. 35 whose purpose is to enable the 
performance of macroscopic tasks using appropriately designed computational experiments 
with microscopic models. The approach focuses on systems for which the coarse-grained. 
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effective evolution equations are assumed to exist but are not available in closed form. One 
example is the case of "legacy" or "black-box" codes, dynamic simulators which, given 
initial conditions, integrate forward in time over an interval At. Alternatively, the effective 
evolution equation for the system may be the unknown closure of a microscopic simulation 
model such as kinetic MC or MD. Rico-Martinez et al^ have used reverse integration in 
conjunction with microscopic forward-in-time simulators to access reverse time behavior of 
coarse variables (see also Ref. 37). Hummer and Kevrekidi^ used coarse reverse integration 
to trace a one- dimensional effective free energy surface (and to escape from its minima) for 
alanine dipeptide in water. In this paper we use reverse integration in two dimensions: a 
ring of system initial states is evolved (forward in time in the "inner" simulation, and then 
reverse in the "outer", coarse integration) to explore two-dimensional Y>oie\itiaX energy (and, 
ultimately, free energy) surfaces. The ring is evolved along the component of the local energy 
gradient (projected normal to the ring) while a nodal redistribution scheme is used that slides 
nodes along the ring so that they remain equidistributed in ring arclength, ensuring efficient 
sampling. Transformation of the independent variable in our basic ring evolution equation 
results in several distinct stepping modes. 

The paper is organized as follows. In Section[2]we present our reverse ring integration ap- 
proach. Ring evolution equations are developed with time, arc-length, or (effective) potential 
energy as the independent variable. We illustrate these stepping modes for a deterministic 
problem with a smooth energy landscape (Miiller-Brown potential). In Section |3] reverse 
ring integration is investigated for three "noisy" problems: a system of stochastic differ- 
ential equations (SDEs), a Gillespie-type stochastic simulation, and a molecular dynamics 
simulation of a protein fragment in water. Estimates of the quantities in the ring evolution 
equation are found by data processing of the results of appropriately initialized short bursts 
of the "black-box" inner simulator. The extension to stepping in free energy is discussed. We 
conclude with a brief discussion of the results and of the potential extension of the approach 
to more than two coarse dimensions. 
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2 Reverse integration on energy surfaces 

Here we present a method for (low-dimensional) landscape exploration motivated by re- 
verse projective integration, on the one hand, and by algorithms for the computation of 
(low-dimensional) stable manifolds of dynamical systems on the other. Reverse projective 
integration!^ uses short bursts of forward in time simulation of a dynamical system to esti- 
mate a local time derivative of the system variables, which is then used to take a large reverse 
projective time step via polynomial extrapolation. This type of computation is intended for 
problems with a large separation between many fast stable modes and a few (stable or un- 
stable) slow ones; the long-term dynamics of the problem will then lie on an attracting, 
invariant "slow (sub)manifold". Reverse projective integration allows us to compute "in 
the past" , approximating solutions on this slow manifold by only using the forward-in-time 
simulation code. 

After each reverse projective step the reverse solution will be slightly "off manifold" (see 
Figure 1); the initial part of the next short forward burst will then bring the solution back 
close to the manifold, while the latter part of the burst will provide the time derivative es- 
timate necessary for the next backward step. One clearly does not integrate the full system 
backward in time (the fast stable modes make this problem very ill-conditioned); it is the 
slow "on manifold" backward dynamics that we attempt to follow. The approach can be used 
for deterministic dynamical systems of the type described; however, it was developed having 
in mind problems arising in atomistic/stochastic simulation where the dynamic simulator is 
a molecular dynamics or kinetic Monte Carlo code. If the dynamics can be well described 
by a (low-dimensional) effective ODE, or an effective SDE, characterized by a potential (and 
by a (low-dimensional) effective free energy surface), reverse projective integration can be 
implemented as an "outer" algorithm, wrapped around the high-dimensional "inner" deter- 
ministic/stochastic simulator. The combination of short bursts of fine scale "inner" forward 
in time simulation with data processing and estimation and then with coarse-grained "outer" 
reverse integration can then be used to systematically explore these effective potentials (and 
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associated effective free-energy surfaces). 

A natural set of protocols for such an exploration has already been developed (in the 
deterministic case) in dynamical systems theory - indeed, algorithms for the computation 
of low-dimensional stable manifolds of vectorfields provide the "wrappers" in our context 
(see the review in Ref. 38). This is easily seen in the context of a two-dimensional gradient 
dynamical system: an isolated local minimum of the associated potential is a stable fixed 
point and, locally, the entire plane is its stable manifold; the potential is a function of the 
points on this plane. In our 2-dimensional case, we approximate this stable manifold by a 
linearization in the neighborhood of the fixed point - this could be in the form of a ring of 
points surrounding the fixed point. One can then integrate the gradient vectorfield forward 
or backward in time (see Figure 2) keeping track of the evolution of this initial ring; using 
the gradient nature of the system, one can compute, as a byproduct, the potential profile. 

Various versions of such "reverse ring integration" have been previously used for visu- 
alizing two-dimensional stable manifolds of vector fields. Johnson et al.'^ evolved a ring 
stepping in space-time arclength (see below) with empirical mesh adaptation and occasional 
addition of nodes to preserve ring resolution, building up a picture of the manifold as the 
ring expands. Guckenheimer and Worfolk^'' used algorithms based upon geodesic curve con- 
struction to evolve a circle of points according to the underlying vectorfield. A survey of 
methods for the computation of (un)stable manifods of vectorfields can be found in Ref. 38, 
including approaches for the approximation of fc-dimensional manifolds. In this paper we will 
restrict ourselves to the two-dimensional case (and thus, eventually, explore two-dimensional 
effective free energy surfaces). 

Clearly, forward integration of our ring constructed based on local linearization around 
an isolated minimum, will generate a sequence of shrinking rings converging to the minimum 
(stable fixed point). For a two-dimensional gradient vectorfield, backward (reverse) ring 
integration will "grow" the ring - and as it grows on the plane, the potential on the ring 
evolves "uphill" in the initial well, possibly toward unstable (saddle-type) stationary points. 
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A critical issue in tracking the reverse evolution of such a ring is its distortion, as different 
portions of it evolve with different rates along the "stable manifold" (here, the plane). Deal- 
ing with the distortion of this closed curve and the deformation of an initially equidistributed 
mesh of discretization points on it requires careful consideration; similar problems arise, and 
are elegantly dealt with, in (forward in time) computations with the string method'^^'^. 
While we will first implement our reverse ring integration on a deterministic gradient prob- 
lem (for descriptive clarity), our aim is to use it as a wrapper around atomistic/stochastic 
inner forward-in-time simulators; three such illustrations will follow. 

2.1 The deterministic two-dimensional case 

Consider a simple, two-dimensional gradient system of the form 



dx 

~dt 



dx/dt 
dy/dt 



-VV{x,y). (2.1) 



In this case, since the vectorfield is explicitly available, with a; in M^, we can perform reverse 



integration by simply reversing the sign of the right hand side of equation (2.1); reverse 
projective integration will only become necessary in cases where the (effective) potential is 
not known, and the corresponding gradients need to be estimated from forward runs of a 
many-degree-of- freedom atomistic/stochastic simulator. Note also that here the dependent 
variables x and y are known (as are the corresponding evolution equations). For high dimen- 
sional problems with a low-dimensional effective description, selection of such good reduced 
variables (observables) is nontrivial; we will briefly return to this in the Discussion. 

We start with a simple illustrative example: the Miiller-Brown potential energy sur- 
facel^, which is often used to evaluate landscape search methods since the minimum energy 
path (MEP) between its minima deviates significantly from the chord between them. We 
focus here on reverse ring evolution starting around a local minimum in the landscape and 
approaching the closest saddle point as the ring samples the well. 



The potential is given by 
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V{x, y) = '^Ai exp [ai{x - + bi{, 



x-x^){y-y^) + c.{y-y!f] 



(2.2) 



1=1 



where A = (-200, -100, -170, 15), a = (-1, -1, -6.5, 0.7), b = (0, 0, 11, 0.6), 
c = (-10, -10, -6.5, 0.7), x° = (1, 0, -0.5, -1), and y° = (0, 0.5, 1.5, 1). The neighborhood 
of the Miiller-Brown potential we explore is shown in Figure 3 along with a listing of the fixed 
points, their energy, and their classification. We first discuss the initialization of the ring, 
and then three different forms of "backward stepping" : time-stepping, arclength-stepping in 
(phase space) X (time) and potential-stepping. Our initial ring will be the V = —105 energy 
contour surrounding the minimum at (0.62,0.03). 

A ring is a smooth curve here in two dimensions. In our implementation, we dis- 
cretize this curve and denote the instantaneous position of the discretized ring by the vectors 
$j = $(aj,t) = [x{ai,t),y{ai,t)] (with $j in M^, at in M) for the coordinates of the i*^ dis- 
cretization node, where is a suitable parametrization. A natural choice is the normalized 
arc-length along the ring with aj G [0, 1], as in the string method, but now with periodic 
boundary conditions. Note that one does not need to initialize on an exact isopotential 
contour; keeping the analogy with local stable manifolds of a dynamical system fixed point, 
one can use the local linearization - and more generally, local Taylor series - to approximate 
a closed curve on the manifold. Anticipating the "energy-stepping" reverse evolution mode, 
however, we start with an isopotential contour here. This requires an initial point on the 
surface; we then trace the isopotential contour passing through this point using a scheme 
which resembles the sliding stage in the "Step and Slide" method of Miron and Fichthorn'^ 
for saddle point identification. We simply "slide" along the contour to generate a curve F, 
moving (in some pseudo-time r) perpendicular to the local energy gradient according to 



dT 

dr 



dV/dy 



(2.3) 



dV/dx 
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Points along the curve T provide initial conditions for ring nodes. Figure 4a illustrates ring 
initialization starting "in the well" , close to the isolated local minimum, resulting in a closed 
ring. 

We note that our approach is closely related to established landscape search techniques 
based on following Hessian eigenvectord^'SIISIMiZIlE j^ere the computation is performed in 
a dynamical systems setting: we use a dynamic simulator to estimate time derivatives (and 
through them local potential gradients) on demand. 

2.2 Modes of Reverse Ring Evolution 
2.2.1 Time Stepping 

When every point on a curve evolves backward in time, it makes sense to consider the 
evolution of the entire curve in the direction of the component of the energy gradient normal 
to it (as also happens for forward time evolution in "string" methods, commonly used to 
identify minimum energy paths (MEPs)'^. Ring nodal evolution is given by 

^ = -(VV(<l>.))^ + rf (2.4) 

where T is the unit tangent vector to $ at ^j, with T = evaluated at $j, and r 

is a Lagrange multiplier field'^Sl (determined by the choice of ring parametrization) used to 
distribute nodes evenly along the ring. The component of potential gradient normal to the 
ring (VV")-*- is defined as follows 

{VVy = VV - {VV ■ f)f = VV - {VVf (2.5) 

where (VK)" is the component of the gradient parallel to the ring. For the general case where 
{W{^i))^ is unavailable in closed form (e.g. the inner integrator is a black-box timestepper) 
we use (multiple short replica) simulations for each discretization node on the ring to estimate 
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it, as will be discussed in Section[3j In practice, the tasks of node stepping and redistribution 



are often split into separate stages. The term involving r in equation (2.4) is first omitted, 
and nodal stepping is performed solving, backward in time, the A^-node spatially discretized 
form 

^ = F(<l>„t) = -Vy(<l>,)^, ^ = 1,2,..., AT (2.6) 

where $i denotes the position of node i in the discretized ring. The normalized arc-length 
coordinate ai associated with the z*'* node is approximated using the linear distance formula 



a, = — = (2.7) 



■I'r- 



where ($^, $f) are the coordinates of node i. Periodicity of the ring (which has N — 2 distinct 
nodes) is imposed by the set of algebraic equations 

($0t = {^N-2+i)t (2.8) 

where evaluation at time t is indicated by the subscript outside the parentheses. An explicit, 
backward in time, Euler discretization for the N — 2 distinct nodes reads 

i^^)t^At = i^^)t-AtF{^i,t), 1 = 2,3,. ..,N-1. (2.9) 

Backward stepping in time is followed by a redistribution step that slides nodes along the ring 
so that they are equally spaced (or, generally, spaced in a desirable manner) in the normalized 
ring arclength coordinate. These two basic steps are also present in the (phase space x 
time) arclength or potential stepping of the ring discussed below; they are schematically 
summarized in Figure 5. 

Figure 6 shows snapshots of the ring as it evolves backward in time - in the time-stepping 
mode - on the Miiller-Brown potential. The ring quickly deviates from isopotential contours 
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as it climbs up the well. The local speed is proportional to the local component of the 
potential gradient normal to the ring; wide variation in nodal speeds causes the ring to 
evolve unevenly, elongating along the directions of steepest ascent. Initially equi-spaced ring 
nodes would, if not redistributed, rapidly converge towards regions of high potential gradients 
in our parametrization, resulting in poor resolution in other areas. Even the redistribution 
of nodes, however, will not suffice to accurately capture the ring shape as the ring perimeter 
quickly grows, unless new nodes are added. 

2.2.2 Arc Length Stepping 

Integration with respect to arclength in (phase) space x time is a well known approach 
for problems where some of the dependent variables change rapidly with the independent 
variable (time). Johnson et al.'^ used this vector field rescaling to offset the concentration 
of flow lines in computing two-dimensional invariant manifolds of vectorflelds whose flxed 
points have disparate eigenvalues. Ring evolution by integration along the solution arc s is 



used here by transformation of the independent variable for the system in equation (2.4). 
The required transformation relationH^l is 



dt 
ds 



d^y 

"dT ^ \~dt' 



'1/2 

= F,($„t) (2.10) 



with coordinates ($f , $f ) for node i. The transformed nodal evolution equation, with solu- 
tion arclength as the independent variable, is given by 



d^i ( dt 



ds dt \ ds 



F($„t)F,(<l>,,t), z = 2,3,...,iV-l (2.11) 



where F is as deflned in equation (2.6), and the ring boundary conditions remain periodic. 

In such an arc-length stepping mode, the ring evolution for our potential (Figure 7) 
is more robust to potential gradient nonuniformities. However, ring growth now does not 
couple to the topography of the landscape: in Figure 7b it "sags" along the y-direction and 
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there is considerable variation of potential values along any instantaneous ring state. 



2.2.3 Potential Stepping 

Evolving in constant potential steps enables the ring to directly track isopotential contours 
of the landscape. Potential stepping is shown schematically in one dimension in Figure 8 
for a potential minimum bracketed by a sharp incline on one side and a more gradual one 
on the other. A (reverse) step in the potential results in small variations in the a;-variable 
((Ax)i, (Ax)2) when the "terrain" is steep and in large x increments ((Ax)3, (Ax)4) when it 
is shallow. A qualitatively different approach is that of Laio and Parinellol^ who employed a 
history-dependent bias as part of free energy surface searching that "fills" free energy wells; 
using repulsive markers actively prevents revisiting locations during further exploration. 
Irikura and Johnson'^ used a combination of steps parallel and perpendicular to the energy 
gradient in a version of isopotential searching to identify chemical reaction products from a 
reactant configuration. 

Here we directly transform the independent variable of the evolution equations using the 
chain rule 



dt \ 
dV 



dV d<!>f ^ dV d<!>r~^ 



= Fv{<^^,t) (2.12) 



9$^ dt dt 

so that, as long as the quantity above is finite (e.g. away from critical points), the ring 
evolution equations now become 

d^i d^i f dt\ , , , , , , 

: =F($,,t)Fy(<l>„t), 1 = 2,3,. ..,N-1. (2.13) 



dV dt \dV J , 

Noting that Fy($j,t) ^ cxd in regions where the potential is "fiat" {dV/dt 0); we impose 



an upper limit on the change in the variables $j at each step of (2.13) when the threshold is 
exceeded. 

Potential-stepping ring evolution on the Miiller-Brown potential is shown in Figure 9: the 
ring efficiently rises within the well and successive rings are indicative of the topology of the 
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local landscape. The energy well is "sampled" evenly, tracking the potential contours. The 
almost linear segment of the ring visible in the final snapshot in Figure 9a is formed as the 
ring approaches the (stable manifold of the) saddle point on the potential at (0.21, 0.29); no 
further uphill motion, normal to the ring, is possible in this region. When such a situation is 
detected, one actively intervenes and modifies the evolution to assist the landscape search; 
examples of this will be given below. 



2.3 Adjacent Basins 



The reverse integration for the example in Section 2^ consisted of initialization close to the 
bottom of a single well, ring evolution uphill, and approach to the neighboring saddle point. 
We now discuss a reasonable strategy for transitioning between neighboring energy wells. 

Figure 10 shows the results of reverse ring integration for 3 different initial rings, one 
close to the bottom of each of the wells of the Miiller-Brown potential. Reverse integration 
here maps out the basin of attraction of each of the wells. For each initial condition, the 
reverse integration "stalls" in the vicinity of neighboring saddle points and ring nodes "flow 
along" the stable manifold of the saddle. As the ring nodes approach a saddle point the 
component of the energy gradient normal to the ring ((VF($i))"'") starts becoming negligible. 
To examine transitions between neighboring basins on the landscape we can employ Global 
Terrain methods'^ that exploit the inherent connectedness of stationary points along valleys 
and ridges on the landscape. Figure 11 indicates the basins of attraction for each of the 
minima (identified using reverse integration) along with a red curve, which connects points 
that minimize the gradient norm along level curves of the potential (a minimum energy 
path). This information is accumulated as the ring integration proceeds and suggests the 
direction to follow to locate neighboring minima. Upon detecting a local stagnation of ring 
evolution, caused by the approach to a saddle, a simple strategy is to (a) perform a local 
search for the saddle, through a fixed point algorithm, (b) compute the dynamically unstable 
eigenvector of this saddle, and (c) initialize a downhill search on the "other side" along this 
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eigenvector away from the saddle point. This search for the nearby minimum may be through 
simple forward simulation, or (in a global terrain context) by following points that minimize 
the gradient norm along level potential curves as above. This leads to the detection of 
a neighboring minimum, from which a new ring can be initialized and a further round of 
reverse integration performed. We reiterate that the procedure described so far (for purposes 
of easier exposition) is only for two-dimensional, deterministic landscapes. 



3 Illustrative Problems for Effective Potential Surfaces 



In this section we present coarse reverse integration using effective potential stepping for three 
"noisy" problems: a system of SDEs, a Gillespie-type stochastic simulation algorithm, and 
a molecular dynamics problem (alanine dipeptide in water). We assume that the problems 
we consider - in the regime we study them computationally - may be effectively modelled 
by the following bivariate Stochastic Differential Equation (SDK) (all the examples studied 
are effectively two-dimensional) 



dX = d 







Vl 


[X) 




D 







Wit 






dt + 






d 


X2 




V2 


[X) 







D 




W2t 



(3.1) 



where Vi{X) and V2{X) are drift coefficients, the diffusion matrix D is proportional to the 
unit matrix 6ij with D = D6ij (a "scalar" matrix), where D is a constant, and Wu and W2t 
are independent Wiener processes. We previously considered (Section |2.2[ ) a deterministic 
example where numerical estimates for potential gradients were used to implement potential 
stepping. In the deterministic case, the drift coefficients are equal to minus the gradient of 
a potential V. For stochastic problems, such as those considered in this section, the drift 
coefficients are not so simply related to the gradient of an effective (generalized) potential 
(see the Appendix for additional discussion of this for 1-dimensional stochastic systems). In 
general, for reverse integration with steps in effective potential, we require estimates of all 
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drift coefficients and all entries in the diffusion matrix (and even their partial derivatives) 



Here we discuss effective potential stepping for a system of the form given in (3.1) and 
also briefly discuss the general case where entries of the diffusion matrix are non-zero and 
dependent on X. 



We assume equation (3.1 ) exists but is unavailable in closed form; estimates are therefore 
obtained by observing the process X and using vi{X) = \imAt^o{[AXi]) / At,V2{X) = 
limAt^o([AX2])/At, and 2D = limAt->o([AXi]')/At = limAt^o([AX2]')/At. Here AX^ = 
Xi{t + At) - Xi{t) and, by the form of equation (|3l|), limAt-.o(AXiAX2)/At = 0. 

These formulas, especially the ones for the drifts, suggest the construction of a useful 
coarse "pseudo-dynamical" evolution for our ring - a coarse evolution that follows the poten- 
tial of mean force. The simplest version of these pseudo-dynamics evolves each point on the 
ring based on the local estimated drift - for the constant "scalar" diffusion mentioned above 
this evolution follows the potential of mean force (PMF), and it becomes a true dynamical 
evolution at the deterministic limit. 



For a black box code implementing equation (3.1 ) this involves initializing at X, running 
an ensemble of realizations of the dynamics for a short time 6t, estimating the local drift 
components of the SDE using the above formulas, performing a (forward or backward) 
projective step At in time (AXj = Vi{X)At), and repeating the process. 

We will argue that this accelerated pseudo-dynamical evolution (which, we emphasize, 
does not correspond to realizations of the SDE itself) can assist in the exploration of effective 
potential surfaces. The easiest approach would be to use reverse time-stepping, or reverse 
arclength stepping in these pseudo-dynamics, and then (using formulae that will be discussed 
below and in the Appendix) finding the effective potential corresponding to each node visited. 
It is also possible, as we will see, to directly make "upward" steps in the effective potential; 
indeed, for the constant diffusion coefficient case we are studying, a proportionality exists 
between backward steps in time (for the pseudo-dynamics based on the drifts) and upward 
steps in the effective potential. 
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In the SDE case (Section 3.1 ) we only allow ourselves to observe sample paths generated 
by short bursts of the SDE solver; the SDE solver itself is treated as a black box (similarly 
for the Gillespie and MD simulators). A simple approach to estimating effective potential 
gradients (and eventually free energy gradients) is to perform sets of M-replica bursts of 
inner (SDE, Gillespie, MD) simulation initialized at each of the ring nodes. For short 
replica simulation bursts (with n time steps), we can assume a local first order in time 
modell^Sl for the mean x (an n x 2 matrix, with entries averaged using multiple replicas, rows 
corresponding to time abscissas, and columns corresponding to each coarse variable) 



x = iC + e (3.2) 

where t = [1 t] is a x 2 matrix, 1 is a vector of n ones, t is a vector of time abscissas, 
e is the n x 2 matrix of model errors, and C is the 2x2 matrix of parameters computed 
(for each node) using least squares estimation. The (pseudo-time) derivative information (in 
the matrix C) is required, along with approximations of the tangent vectors at each node 
(ring geometry) to update the ring node positions in a reverse integration step; diffusion 
coefficients are also required, as discussed further below, to compute the relation between 
a reverse integration step size in pseudo-time and the corresponding change in the effective 
potential. In the remainder of the paper reverse ring time-stepping is always meant in terms 
of the drift-based pseudo-dynamics (it only becomes true time-stepping at the deterministic 
limit). 

This derivative information may also be used to confirm the existence of an effective 
potential. For the case of two effective coarse-dimensions, we locally compare, computing on 
a stencil of points, the X2- variation of dXi/dt with the Xi-variation of (1X2/ dt (testing for 
equality of mixed partial derivatives of the effective potential). Alternatively, we may use a 
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locally affine model for the drift coefficients of the following form 



V2{X) 



AX + B 



(3.3) 



with A G M^^^, S G M^, and employ maximum likelihood estimation techniques to compute 
A and B (an effective potential exists provided A12 = A21). In this context, recently devel- 
oped maximum likelihood'^^ or BayesianHSI estimation approaches are particularly promising, 
allowing for simultaneous estimation of both the drift and diffusion coefficients. These ap- 
proaches assume that the data are generated by a (multivariate) parametric diffusion; they 
employ a closed-form approximation to the transition density for this diffusion. For the case 
of a one-dimensional diffusion process X 



dX = fx{X; 9)dt + a{X; e)dWt 



(3.4) 



where Wt is the Wiener process, is a parameter vector, /i is the drift coefficient, and a is 
the diffusion coefficient, the corresponding log likelihood function ln{0) is defined as 



(3.5) 



1=1 



where n is the number of time abscissas, X^a is the i*^ sample, and A is the time step 
between observations in the time series. The derivation of a closed-form expression for the 
transition density pj^ (and thereby the log likelihood function) allows for maximization of 
In with respect to the parameter vector 6 providing "optimal" estimates for the drift and 
diffusion coefficients associated with the time series. For higher-dimensional problems (such 
as the two-dimensional ones considered here) see Ref. 48. 



If the system in (3.1), with "scalar" diffusion matrix, has drift coefficients that satisfy 
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the following potential condition 



it follows that the probability current vanishes at equilibrium, the drift coefficients (the 
time-derivatives in our ring pseudo-dynamics) satisfy 

V. = -D^^, (3.7) 

and the difference in effective generalized potential (free energy) between a reference state 
(X^jXg) and the state (Xi,X2) may be directly computed from the following line integral^Sl 

PAE"^ = -D-^ (j \i{X[,X^)dX[ + j \2{Xi,X'^)dX'A . (3.8) 



^1 " ^"-2 



The analogy with the deterministic case (eqs.(2.12) and (2.13)) carries through: the esti- 
mated drifts are proportional (via the constant D) to the effective potential gradients, and 
evolution following the drifts directly corresponds (modulo the proportionality constant) to 
evolution in the effective potential (PMF). Estimates of the local effective diffusion coeffi- 
cients are typically necessary for exploration of the effective potential surface. We note that 



for a diagonal diffusion tensor with identical entries, (3.1), the size of the step (ilS.E is 



scaled (in (3.8)) by the diffusion constant D. It follows that estimation of only the drift 
coefficients v\{X.^ and f2(^) allows us to perform reverse integration in our coarse dynam- 
ics (associated with the potential of mean force). A backward in time step At, leading to 
the state change AXj = fj(X)At, is, in effect, an "upward" step in the effective potential 
with the (unknown) scaled stepsize DjSAE'^^. This approach is analogous to (and, in the 
appropriate limit will approximate) the deterministic potential stepping previously described 



(Section 2.2). Here, for a stochastic problem, we need to additionally estimate diffusion co- 



efficients to compute the potential change associated with each ring step uphill and, thereby. 
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the effective free energy change associated with each ring. 

For the general diffusion matrix D{X), with all entries possibly non-zero and dependent 
on X, we would compute the following partial derivatives of the effective potential 



1 idD2i , dD22 



+ + (3.9) 



d{PE^^) _ 1 (dD,, dD,2 \ 



+ + (3.10) 



dX2 

dXi 0X2 

and test whether the following potential condition is satisfied!^ 

dAijX) ^ dA2{X) 
8X2 9X1 ■ 



(3.11) 



If these potential conditions are satisfied then the effective generalized potential (free energy) 
may again be directly calculated from the following line integral'^Sl 

j3E'^{Xi,X2) = (3E''^{Xl,Xl) + Ai{X[,X^)dX[+ / ' A2(Xi,X^)dX^. (3.12) 



We do not consider the case when equation (3.11) does not hold; we refer the reader to 
Ref. 49. 



In the same spirit with reverse ring stepping in potential (Section |2.2 ), reverse ring 
stepping in effective potential may also be accomplished, subject to the stated assumptions, 
using the inner integrator as a black-box: we run multiple replicas for particular initial 
conditions (the positions of nodes in the ring), observe (inner) forward time evolution, and. 



for a "scalar" diffusion matrix, use the estimated drifts and (3.8) to approximate changes 
in the effective potential numerically. We note that for a constant and isotropic diffusion 
tensor if we estimate only the drift coefficients we can still perform reverse ring stepping 
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in the correct uphill direction and follow isopotential surfaces but the actual step size (and 
thus the actual value of the potential on the isopotential surfaces) will be unknown. As 
reverse ring integration proceeds, we store all calculated effective gradient values at each set 
of coarse variable values, thereby building a database. Smoothed gradient estimates may be 
obtained for each ring node by using a weighted gradient average that includes estimates at 
nearby coarse variable values in the database; we use kernel smoothing*^ to select appropriate 
weights. For the more general case of state-dependent diffusion the drift dynamics do not 
simply correspond to dynamics in the effective potential (see the Appendix for corrections to 



d^i/dt required to retain the analogy to the deterministic equations (2.12) and (2.13)). One 
could still employ the uncorrected drift dynamics as an ad hoc search tool (especially for 
problems close to "scalar" diffusion matrices) and post-compute the effective potential values 
the ring visits. In this case, however, the time-parametrization of the effective potential 
evolution will not be meaningful, and will even dramatically fail in the neighborhood of drift 
steady states that do not correspond to critical points in the effective potential (and vice 
versa) . 



3.1 A Stochastic Differential Equation Example 

In this section we consider ring evolution in potential-stepping mode for a system of stochastic 
differential equations (SDEs). Reverse ring integration is performed at the outer level. The 
inner routine here is a forward-in-time SDE (Euler-Maruyama) integrator based on which 
we generate the nodal gradient estimates required by the outer ring integrator. The SDE 
system is given by 

dx{t) = D^F^{t) dt + dWit (3.13) 

dy{t) = DyFy{t) dt + dW2t (3.14) 
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where = Dy = D = 1, = Fy = — f^, and the function V{x, y) is given by 

y (x, y) = 10{x^ -lf + 2x + ^{y- xf. (3.15) 
The discretized (using the Euler-Maruyama scheme) system of equations is as follows 

x{t + At) = x{t) + DF^At + V2DAt Af{0, 1) (3.16) 

y{t + At) = y{t) + DFyAt + V2DAt ^^{0, 1) (3.17) 

where A/'(0, 1) is a normal random number with mean and unit variance. We initialize 
the ring on an isopotential contour about the minimum at (—1.024, —1.024). The ratio of 
eigenvalues at this fixed point is approximately 90 - the well is sharply elongated in the 
y-direction. To cope with this sharp elongation, we adaptively adjust the distribution of 
ring nodes so that they remain concentrated in regions where the ring curvature is largest 
(we did not adaptively change the number of nodes here). 

The results of ring evolution (following drifts only) in a single well for the SDE problem are 
shown in the left panel of Figure 12 (contour lines are shown for V{x, y)). Here we plot ring 
nodal positions at every reverse integration step in drift potential. Here, the selected diffusion 
coefficients {D^ = Dy = D = 1) and the functional form of the drift and diffusion terms in 



eqs.(3.13) and (3.14) necessarily imply that (3 = 1- the effective potential is essentially the 
same as the drift potential (see also the Appendix). Reverse integration eventually stalls in 
the vicinity of the saddle point at (0,0). In Figure 12 (right panel) we show the estimated 
effective potential associated with ring nodes superimposed on 3D contour lines for V{x,y). 
Estimates of both local drift and diffusion coefficients are used to compute effective potential 



differences (using equation (3.8)) for successive rounds of reverse integration (as generated 
by potential stepping, shown left). The effective potential is computed relative to that of 
the initial condition (a ring on an isopotential contour). 
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As the local curvature of the landscape changes, the duration of our short inner compu- 
tation bursts (the time interval over which we collect data to estimate derivatives) should 
be adaptively modified for computational accuracy. 



3.2 A Gillespie— type SSA inner simulator example 

The stochastic description of a spatially homogeneous set of chemical reactions, which treats 
the collisions of species in the system as essentially random events, is based on the chem- 
ical master equationSSl. The Gillespie Stochastic Simulation Algorithm (SSA) is a Monte 
Carlo procedure used to simulate a stochastic formulation of chemical reaction dynamics 
that accounts for inherent system fluctuations and correlations - this procedure numerically 
simulates the stochastic process described by the spatially homogeneous master equationSSl. 
At each step in the simulation a reaction event is selected (based on the reaction probabili- 
ties), the species numbers updated (according to the stoichiometry of the reactions) and the 
time to the next reaction event computed. The reaction probabilities used in the algorithm 
are determined by the species concentrations and reaction rate constants as described in 
Ref. 52. The inner stochastic simulation routine we use here happens to employ an explicit 
tau-leaping scheme that takes larger time steps to encompass more reaction events while 
still ensuring that none of the propensity (reaction probability) functions in the algorithm 
changes significantly'^. The reaction events we simulate are chosen to implement a mecha- 
nism which, at the limit of infinitely many particles, would be described by the deterministic 



gradient system with potential V{x,y) defined in equation (3.15). 
Consider the following deterministic rate equations 

dec 

— = —kix + k2X^ — ksx^ + — k^x + k^y (3.18) 

^ = k^x - key + kj. (3.19) 
at 

This set of deterministic (coarse) rate equations may be written, for this problem, in the 
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form of the following gradient system 



_ = -Vy*(x,y) (3.20) 

where x may be interpreted, here, as a vector of chemical species concentrations, and the 
potential energy function V*{x,y) is given by 

V\x, y) = x'-'^x'+'^x'-hx- hxy + 'f y' - k,y + h (3.21) 

with 

h = h- (3.22) 

Values for the rate constants are selected by requiring V*{x,y) = V{x — 5,y — 20) (i.e. 
V*{x,y) is selected as a shifted version of the V{x,y) from the previous example, with its 
fixed points in the positive xy quadrant, in an attempt to enforce positivity of the reaction 
probabilities required by the Gillespie algorithm). The rate constant values chosen are 
ki = 2960,^2 = 600, A;3 = 40,^4 = 4783, = k^ = l,k-r = 15. This models the following, 
hypothetical, set of elementary reactions 



X (3.23) 

2X + U^3X (3.24) 

V^X (3.25) 

X (3.26) 

W !^Y (3.27) 



where species X(resp. Y) has concentration x(resp. y), the species T, [/, V, and W are as- 



sumed to have unchanging concentration 1, and the reactions in eqs.(3.25) and (3.27) follow 
zeroth order kinetics. 
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For the number of particles used in this Gillespie simulation, the drift coefficients esti- 
mated from the simulation practically coincide with the right-hand-side of the deterministic 
rate equations, which happen to be embody the gradient of the deterministic potential 



V{x,y) (equation (3.15)). The results of reverse ring integration up this deterministic po- 
tential, with drifts estimated from our Gillespie simulation are shown in Figure 13. The left 
panel shows nodal evolution over 100 rounds of reverse integration. In the right panel we 
superimpose the nodal evolution (with estimated potential indicated by color) on contours 



of the potential V{x,y) (defined in equation (3.15)) for the deterministic gradient system 



in the form of (2.1). Since we are using an explicit tau-leaping Gillespie scheme, we do not 
have accurate estimates of the diffusion coefficients of the underlying chemical Fokker-Planck 
equation'SD. For this problem these entries in the diffusion matrix cannot be well approxi- 
mated as state-independent, and a more involved process that includes their estimation is 
required in order to construct the true effective potential. 



3.3 Alanine dipeptide in Water 

In this section we study the coarse effective potential landscape of alanine dipeptide (i.e. A^- 
acetyl alanine iV'-methyl amide) dissolved in water using coarse reverse (effective potential- 
stepping) integration. This system is a basic fragment of protein backbones with two main 
torsion angle degrees of freedom {C — N — Ca — C) and ip {N — Ca — C — N), and with polar 
groups that interact strongly with each other and with the solvent. Extensive theoretical and 
experimental investigation of the alanine dipeptide has suggested good coarse observables 
(dihedral angles) for this system'S2S3l5l_ 

Figure 14 shows the effective free energy landscape as a function of the dihedral angles 
and ip of the alanine dipeptide. The structures of the alanine dipeptide in the a-helical 
{ip = —0.3) and extended {ip = ^) states (corresponding to minima on the landscape) 
and at the transition state between them are also shown. We will use reverse integration 
on the effective potential energy landscape parametrized by these coarse coordinates. The 
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coarse reverse integration is "wrapped around" a conventional forward-in-time molecular 
dynamics (MD) simulator. It provides protocols for where (i.e. at what starting values 
of the coarse variables) to execute short bursts of MD, so as to map the main features of 
the effective potential surface (minima and connecting saddle points). These short bursts 
of appropriately initialized MD simulations provide (via estimation of the coefficients in 



equation (3.1)) the deterministic and stochastic components of the alanine dipeptide coarse 
dynamics parametrized by the selected coarse variables. The current work assumes a diffusion 



matrix (equation (3.1)) that is diagonal with identical constant entries. Our MD simulations 
of the alanine dipeptide in explicit water are performed using AMBER 6.0 and the parm94 
force field. The system is simulated at constant volume corresponding to 1 bar pressure, and 
the temperature is maintained at 300K by weak coupling to a Berendsen thermostat. All 
simulations use a time step of 0.001 ps. The "true" effective potential here is the one obtained 
from the stationary probability distribution as approximated by a long MD simulation (24 
ns). 

A preparatory "lifting" step is required at each reverse integration step for each ring node. 
Each coarse initial condition is lifted to many microscopic copies conditioned on the coarse 
variables and ip. This step is not unique, since many distributions may be constructed 
having the same values of the coarse variables. Here we lift by performing a short MD run 
with an added potential V'^"'^^^'^ that biases (as in umbrella sampling) the coarse variables 
towards their target values (?/^*'^''s,0*^''§). 



\/'=°-tr = k4^_ ^t-s)2/2 + A;^(0 _ 0t-g)2/2 (3.28) 

with = = 100 kcal mol~^rad~^. The short lifting phase provides sufficient time for the 
fast variables to equilibrate following changes in the coarse variables. Following initialization 
we run and monitor the detailed MD simulations over short times (0.5ps) and estimate, for 
each node in the coarse variables, the local drifts over multiple replicas. Each coarse backward 
Euler step of the ring evolution provides new coarse variable values at which to initialize 
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short bursts of the MD simulator. Each step in the reverse integration procedure consists of 
hfting from coarse variables (the coordinates of the ring nodes) to an ensemble of consistent 
microscopic configurations, execution of multiple short MD runs from such configurations, 
restriction to coarse variables, estimation of coarse drifts and diffusivities, and reverse Euler 
stepping of the ring in the chosen evolution mode. 

Figure 15 (left panel) shows ring nodes for 30 steps of reverse ring integration (using 
N—12 nodes) initialized around the extended structure minimum. Successive rings evolve 
up the well and are representative of the well topology. Reverse integration stalls, as ex- 
pected, at the saddle points neighboring the extended structure minimum and identifies 
candidate saddle points in these regions. We note that, in the current context of (assumed) 
constant diffusion coefficients we can think of these saddles as steady states of the set of 
deterministic ODEs, coinciding with the drift terms of the effective Fokker-Planck. Then 
the "dynamically unstable" directions in a saddle (the downhill ones) are characterized by 
positive eigenvalues of the Jacobian of the drift equations; yet since these equations are pro- 
portional to the negative of the gradient of a potential, positive eigenvalues of the dynamical 
Jacobian correspond to negative eigenvalues of the Hessian. The eigenvectors associated with 
the unstable (for our PMF-related coarse dynamics) eigenvalue at these candidate saddles 
are also indicated in Figure 15 and suggest the directions to dynamically follow to locate 
neighboring minima. We perturbed in the direction of the unstable eigenvector (associated 
with positive eigenvalue) away from one of the candidate saddle points and initialized (using 
a constrained potential, as before) multiple MD runs from this location. In Figure 15 (right 
panel) we plot the observed evolution from these initial conditions down into the basin of 
the adjacent a-helical minimum. 

In Figure 16 we show reverse ring evolution initialized close to both ct-helical and extended 
minima. Clearly reverse ring evolution in this a-helical minimum well takes larger steps 
in 0, in which direction the effective potential is shallowest. We repeat that the reverse 
integration steps correspond to constant steps in free energy only if the effective diffusion 
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tensor is diagonal and constant in both directions. The ring evolution shown in Figure 16 
appears to accurately track equal free energy contours suggesting that these assumptions 
(on the form of the diffusion tensor) are a suitable approximation here. 

4 Summary and Conclusions 

We have presented a coarse-grained computational approach (coarse reverse integration) for 
exploration of low-dimensional effective landscapes. In our two-coarse-dimensional examples 
an (outer) integration scheme evolves a ring of replica simulations backwards by exploiting 
short bursts of a conventional forward-in-time (inner) simulator. The results of small periods 
of forward inner simulation are processed to enable large steps backward in time (pseudo- 
time in the stochastic case), in phase space x time, or in potential in the outer integration. 
We first illustrated these different modes of reverse integration for smooth, deterministic 
landscapes. We extended the most promising approach for an illustrative deterministic 
problem, isopotential stepping, to relatively simple noisy (or effectively noisy) systems where 
closed-form evolution equations are not available. Simple estimation techniques were applied 
here to the results of appropriately initialized short bursts of forward simulation used locally 
to extract stochastic models with constant diffusion coefficients. Reverse integration in a 
single well and the approach to/detection of neighboring coarse saddles was demonstrated. 
A brief discussion of Global Terrain approaches for exploring potential surfaces was included, 
along with a short demonstration of linking our approach to them. 

We have presented here ring exploration using an effective potential, using only estimation 
of the drift coefficients of our effective coarse model equations. Estimation of the diffusion 
coefficients (and their derivatives) is additionally required to quantitatively trace the effective 
potential surface. More sophisticated estimation techniques'^^'^ allow for reliable estimation 
of both the stochastic and deterministic components of the coarse model equations. This 
permits a quantitative reconstruction of the effective free energy surface (and thereby the 
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equilibrium density) using our reverse integration approach. The latter reconstruction is 
possible provided that the potential conditions discussed in Section |3] hold; testing this 
hypothesis should become an integral part of the algorithm. 

In studies of high-dimensional systems, a central question is the appropriate choice of 
coarse variables used in the reverse integration. For high-dimensional systems, such as those 
arising in molecular simulations, the dynamics can typically be monitored only along a few 
chosen "coarse" coordinates. Formally, an exact evolution equation can be derived for these 
coordinates with the help of the projection-operator formalism^^, but that equation will 
be non-Markovian even if the time evolution in the full space is Markovian. To minimize 
the resulting memory effects, one can attempt to identify good (i.e., nearly Markovian) 
coordinates a priori, e.g., based on the extensive experience with the problem (as, say, in 
hydrodynamics) or by data analysis'^^'^. Alternatively, one can monitor the dynamics in a 
large space of trial coordinates and select a suitable low-dimensional space on the fly (e.g. 
from Principal Component Analysis'^. In general problems, where good coordinates are not 
immediately obvious, careful testing of the Markovian character of the projected dynamics 
on the time scale of the coarse forward or reverse integration will be an important component 
of the computation'2SEQ]_ 

For the alanine dipeptide in water many-degree-of-freedom example, we assumed that 
the effective dynamics could be described in terms of a few coarse variables known from pre- 
vious experience with the problem: the two dihedral angles. We are also exploring the use 
of diffusion map techniques'^ for data-based detection of such coarse observables, in effect 
trying to reconstruct FigjO] without previous knowledge of the dihedral angle coarse vari- 
ables. An example of mining large data sets from protein folding simulations to detect good 
coarse variables using a scaled Isomap (ScIMAP) approach can be found in Ref. 64; linking 
coarse variables with reverse integration for this example is discussed further in an upcom- 
ing publication®^. All the work in this paper was in two coarse dimensions. In the context 
of invariant manifold computations for dynamical systems (which provided the motivation 
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for this work) more sophisticated algorithms exist for the computer-assisted exploration of 
higher- dimensional manifolds (as high as 6- dimensional It should be possible - and 
interesting ! - to use these manifold parametrization and approximation techniques in com- 
bination with the approach presented here, to test the "coarse dimensionality" of effective 
free energy surfaces one can usefully explore. 
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Appendix 

Stationary Probability Distribution and Effective Free Energy 

We discuss here the effective potential (effective free energy E^^ltp)) we attempt to com- 
pute through reverse integration and its relation to the form of the stationary probability 
distribution Pgti'ip) for a 1-dimensional Fokker Planck equation (FPE). 

In 1-D we write the FPE (with drift 

v{W = (A-1) 



and diffusion coefficient 



DiM = l'-^, (A-2) 



where ipit', V'o) is a sample path of duration t initialized at ipQ when t = and where cr'^iipo, t) 
is the variance of ip{t; ipo)) as follows: 



dt 



2 



^«'.)^-^ (A-3) 



where the probability current S{ilj,t) is given by 

Sii:, t) = v{ij)Piij, t) - {d/dij)D{ij)P{ij, t). (A-4) 

In 1-D, the stationary probability distribution corresponds to a constant probability cur- 
rent'^, for natural boundary conditions this constant is zero and stationary solutions of the 
FPE satisfy 

v{ij)PstW - {d/d^)D{^)PstW = (A-5) 
which is readily solved for (the logarithm of) Pgt 
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InPsM) = -InDM) + / TTTTTciV^' + const. (A-6) 

J D[i)') 

The connection between the stationary probabihty distribution and the effective potential 
(effective free energy) for systems with a characteristic temperature/energy scale (given by 
the parameter f3~^ = ksT), is provided by the ansatz Pstiip) oc e~^^''^^'^\ Substitution of 



the ansatz into eq.(A-6) gives 



In Section 3, after the fitting of model SDEs, we discussed the use of local estimates of 
the drift and diffusion coefficients in taking steps in some form of the effective potential for 
3 example systems; we consider the basis of this approach here in 1-D. For both the SDE 
and Gillespie problems of Section 3 reverse ring stepping results were compared to particular 



deterministic potentials V{^) (eq.(3.15)). For the alanine dipeptide problem the results of 
reverse ring stepping were compared to an effective potential derived from the stationary 
probability distribution of the system (with the additional assumption of state-independent 
diffusion coefficients). 

If, alternatively, we start from the Langevin equation 

V^ = -7W^ + /o(^) + r(t). (A-8) 

where "^{ip) is the friction coefficient, fo{ip) is a deterministic force (minus the gradient of 
a deterministic potential function V{;ip)), iW"^ is a drag force, and r(t) is the stochastic 
force, and take the high friction (overdamped) limit we obtain 



The fluctuation-dissipation relation connects (correlations of) the stochastic force to the drag 
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force as follows 



(A-10) 



for a system at "temperature" T (energy scale ksT = [3 ^). Using Ito calculus we interpret 



eq.(A-9) as 



with 



7(?/') 



(A-11) 



/o(^) 1 dVji,) 



(A-12) 



7(V^) /?7(V^)' 

This establishes a correspondence of the Langevin equation with the FPE in eq.( A-3[ ). 



(A-13) 



For the case of additive noise, where -D(V') = D = const (implying (by eq.(A-13)) that 



7(^/') = 7 = const), we find (by differentiation of eq.(A-7) w.r.t. ijj) that the drift coefficient 
f (?/^) is simply related to the effective potential E'^^ as follows 



d<il)> 



1 dE'^^i^il)) 
7 



-D 



(A-14) 



dt J 'J dip dip 

In this case, pseudo-dynamical reverse integration following drifts (as performed for the 
model SDE problem) coincides with stepping in effective potential (appropriately scaled 
with the constant diffusion coefficient). Using eq.(A-12) in eq.( A-15") ) we find 



d{pvm diPE'^^m 



1 d <ip > 



(A-15) 



dip dip D dt 

For the case of multiplicative (state-dependent) noise the drift coefficient v{ip) is not 
directly related to the gradient of the effective potential E'^^ extracted from the equilibrium 
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density; instead, it satisfies 



where 



d Kip > 
Jt 



1 dE^^i^) ^_^^^^d{f3E^^{^)) 



dip 



dip 



v{ip') 



Dm 



-jrdip' + const. 



(A-16) 



(A-17) 



with PE'^^ differing from PE"^^ by the state-dependent contribution lnD{ip). For such systems 
with state-dependent noise we require (local) estimates of both drift and diffusion coefficients 



for effective potential stepping. These can be used in eq.(A-7) (resp. eq.(A-17)) to compute 
(differences in) the true effective potential E^^ (resp. the "auxiliary" effective potential E°^) 



d{(3E^^{iP)) 
dtp 



1 d<ip> ^ 1 dD{ip) 



D{iP) dt 



D{iP) dip 



(A-18) 



When temperature is not part of the problem description one considers the SDE 



dip = A{ip)dt + B{ip)dWt (A-19) 
which has the following stationary distribution 

Pstm = ;^e^ ^'^'^ , (A-20) 

c being a normalization constant chosen such that f^^Pst{ip')dip' = 1. Local estimates of 
A{ip) and B{ip) can then, in a similar approach as above, be used to step backwards in 
effective potential. 
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Figure 1. Schematic of reverse projective integration. The thick gray hne indicates the 
position on the slow manifold as a function of time on a forward trajectory. The solid circles 
are configurations along microscopic trajectories run forward in time, as indicated by the 
short solid arrows. The long dashed arrows indicate the reverse projective steps, which result 
in an initialization near, but slightly off, the slow manifold. 

Figure 2. Schematic of forward and backward stepping of ring nodes (light circles) in time 
on an energy landscape in the vicinity of fixed point (dark circles). Solid lines are energy 
contours, dashed lines connect ring nodes at each step, and arrows indicate the direction of 
the ring evolution. 

Figure 3. Contour map of the Miiller-Brown Potential for —1 < x < 1, —0.5 < y < 1. 
Contour lines are shown in black (white) for V{x,y) < {V{x,y) > 0). Stationary points of 
the potential, their classification and energy are tabulated for the region illustrated. 



Figure 4. Distribution of nodes produced by integration of equation (2.3) with initial 
condition above (white nodes and contour lines) and below (black nodes and contour lines) 
the saddle point energy. Below the saddle point there is a separation of isopotential contours 
in each well - the saddle point isopotential contours "split" in two. 

Figure 5. Stages of ring evolution: backward stepping (in time At, arc-length As, or 
potential AV), followed by nodal redistribution. 

Figure 6. Reverse time stepping on Miiller-Brown Potential with At = 5 x 10^^, = 80 
(successive rings are shown at intervals of 10 steps and arrows indicate direction of ring 
evolution). 

Figure 7. Arc length stepping on Miiller-Brown Potential with As = 0.01, = 80 
(successive rings are shown at intervals of 10 steps and arrows indicate direction of ring 
evolution). 
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Figure 8. Energy stepping in a smooth, asymmetric ID energy well. 

Figure 9. Potential stepping on Miiller-Brown Potential with AV = 1.45, = 80 (succes- 
sive rings are shown at intervals of 10 steps and arrows indicate direction of ring evolution). 

Figure 10. Potential stepping on the Miiller-Brown Potential with AV = 0.75, N = 160 
(successive (colored) rings are shown at intervals of 10 steps). Successive rings obtained by 
reverse integration starting from each of the minima on the landscape are shown. 

Figure 11. Potential stepping on Miiller-Brown Potential with AV = 0.75, N = 160 
(successive (colored) rings are shown at intervals of 30 steps). Positions (red circles) of the 
minimum in gradient norm along the ring are shown at intervals of 5 steps in ring integration. 
Three different viewpoints of the same ring evolution are shown. Black arrows indicate the 
direction of ring evolution out of each minimum. Top row: 3D view; bottom row: 2D 
overhead view (gray arrows indicate position of views shown in top row). 

Figure 12. Left: 100 rounds of potential-stepping ring evolution using coarse Euler and 
an inner SDE integrator with AV = 5 x 10^^, D = 1, N = 200; a redistribution that 
concentrates nodes in regions of largest ring curvature is performed every 10 reverse ring 
integration steps. The ring is initially centered at (—1, —1). 50 replica runs are performed 
with the SDE integrator, each run for tfot = 0.5 (with time step size At = 2.5 x 10^^), for 



drift estimation at each node. Contours of the function V{x,y) (defined in equation (3.15)) 
are shown. Ring nodes are shown for every step. Right: the effective potential associated 
with each ring node is shown (indicated by color) computed using local drifts and diffusions 



using (3.8) - it is plotted superimposed on the landscape of the potential (3.15). Evolving 
ring nodes with x < —1.2 are omitted for clarity. Points on a single representative effective 
potential contour (using reverse drift-based integration) are plotted as black symbols in the 
V{x,y) = —10 plane at the base of the figure; points along the actual potential contour are 
shown as red symbols. 



39 



Figure 13. Left: 100 rounds of drift potential-stepping ring evolution using an explicit 
tau-leaping inner Gillespie simulator with = 200; nodal redistribution is performed every 
10 reverse ring integration (coarse Euler) steps. The ring is initially centered at (—1,-1). 
50 replica Gillespie simulation runs are performed, each run with 10, 000 particles and the 
explicit tau leaping parameter e = 0.03. For the reverse integration AV = 5 x 10^^. Contours 



of the function V{x,y) (defined in equation (3.15)) are shown. Right: 3D-view of reverse 



ring integration shown left with estimated potential of each node shown in color. Colorbar 
indicates value of V{x,y). Evolving ring nodes with x < —1.2 are omitted for clarity. 



Contours of the function V{x,y) (defined in equation (3.15)) are shown in 3D. Points on a 
single representative potential contour (as computed using reverse integration) are plotted 
as black symbols in the V{x, y) = —10 plane at the base of the figure; points along the actual 
potential contour are shown as red symbols. 

Figure 14. Free energy landscape for the alanine dipeptide in the (f) — ip plane [IksT 
contour lines). Structures are shown corresponding to the right-handed a-helical minimum 
(left), the extended minimum (right), and the transition state between them (middle). 

Figure 15. Alanine dipeptide ring integration. Left panel: Extended structure minimum: 
30 rounds of reverse (coarse Euler) ring integration (number of ring nodes A^=12) with 
scaled effective potential steps. Note that the scaled steps correspond to constant steps in 
free energy only if the effective diffusion tensor is diagonal with identical, constant entries, 
which appears to be a good approximation here. Eigenvectors corresponding to positive 
eigenvalues for candidate saddle points determined from ring integration are shown (long 
red arrows). Right panel: Downhill runs initialized at transition regions suggested by the 
reverse ring integration from the extended structure minimum. Initial conditions (black dots) 
are generated by umbrella sampling at a target coarse point selected by perturbation along 
the unstable eigenvector at the saddle. Final conditions for these downhill runs (purple dots) 
suggest starting points for a new round of reverse integration from the adjacent minimum. 
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(IfcsT contour lines used in both plots). Note that both wells are plotted rotated by 90 
degrees relative to Figure 14. 

Figure 16. Alanine dipeptide in water: 30 rounds of coarse reverse ring evolution (number 
of ring nodes N—12, Df3l\E'^^ — O.OBkBT) initialized in the neighborhood of both the right- 
handed CK-helical minimum (bottom ring), and the extended minimum (top ring). Rings 
(grey lines) connecting nodes (black solid circles) are shown. Colored energy contours are 
plotted at increments of IksT. 
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